SC4020 - Data Analytics and Mining
Course Summary
This course covers Natural Language Processing in depth. Lecturer at the time of taking the modules were Dr Sun Aixin (for first half) and Dr Wang Wenya (for second half). First half content is more theoretical, covering RegEx, Text Normalisation and Edit Distance, N-gram Language Model, Parts of Speech and Named Entity Recognition, Constituency Grammars and Parsing and Dependency Parsing. Second half is more relevant to NLP work in the industry, topics include ML and DL, Word Vectors, Language and Sequence Modelling, Attention Mechanisms, Transformers and Pretraining. First half lecturer and content is very slow, the second half is the direct opposite where the lecturer speeds through the lectures like a train.
Workload
Workload is manageable, 2 hours lecture followed by a 1 hour tutorial each week. The tutorials are moderately open-ended and don't take too long to finish, but the second half content is quite tough, especially with the equations and math, which can get super confusing. Tutorials are not recorded.
Projects
1 test (15%) that tests the content of the first half of the module right after recess week in Week 8, a project that's issued in recess week and due Week 12, and is on the second half (35%). Exam is 50%.
Tips to Do Well
The quiz is somewhat okay, but due to the nature and structure of the questions it's quite easy to unfairly lose alot of marks easily. If I'm not wrong the average is around 19/30. The project is a 5-6 member group project and can be quite tough, where you have to prep and train multiple models on some datasets. The most tricky part is that it's sequential, so you have to wait for your groupmate to finish their part before you can start yours, which is concerning with the tight timeframe allocated. The exam is a toss-up, where the range of content that can be tested is quite huge, so it's very easy to be not able to remember certain concepts, especially when the question is tweaked significantly.
Based on reviews by DMU, DMU, JT